skater scored an average of 9.0 compared to another who scored an average of 5.0. You will not know
what the skate routines looked like unless you watch them, but the score will already tell you that if
you were to watch them, you would expect to see that the one that scored 9.0 was executed in a more
visually pleasing way than the one that scored 5.0.
Frequency distributions have names for their important characteristics, including:
Center: Where along the distribution of the values do the numbers tend to center?
Dispersion: How much do these numbers spread out?
Symmetry: If you were to draw a vertical line down the middle of the distribution, does the
distribution shape appear as if the vertical line is a mirror, reflecting an identical shape on both
sides? Or do the sides look noticeably different — and if so, how?
Shape: Is the top of the distribution nicely rounded, or pointier, or flatter?
Like using average skating scores to describe the visual appeal of an Olympic skate routine, to
describe a distribution you need to calculate and report numbers that measure each of these four
characteristics. These characteristics are what we mean by summary statistics for numerical variables.
Locating the center of your data
When you start exploring a set of numbers, an important first step is to determine what value they tend
to center around. This characteristic is called, intuitively enough, central tendency. Many statistical
textbooks describe three measures of central tendency: mean (which is the same as average), median,
and mode. You may assume these are the three optimal measures to describe a distribution (because
they all begin with m and are easy to remember). But all three have limitations, especially when
dealing with data obtained from samples in human research, as described in the following sections.
Arithmetic mean
The arithmetic mean, also commonly called the mean (or the average), is the most familiar and most
often quoted measure of central tendency. Throughout this book, whenever we use the two-word term
the mean, we’re referring to the arithmetic mean. (There are several other kinds of means besides the
arithmetic mean, which we describe later in this chapter.)
The mean of a sample is often denoted by the symbol m or by placing a horizontal bar over the
name of the variable, like
. The mean is obtained by adding up the values and dividing by the
sample size — meaning how many there are. (If you are using software for this, make sure
missing values are excluded, or the equation will not compute.) Here’s a small sample of
numbers — the diastolic blood pressure (DBP) values of seven study participants (in mmHg)
arranged in increasing numerical order: 84, 84, 89, 91, 110, 114, and 116. For the DBP sample:
You can write the general formula for the arithmetic mean of N number of values contained in the
variable X in several ways: